Past literature in democratization, nationalism, and autocratic regime maintenance illustrate how education invites both risk and reward for non-democratic states. Education is simultaneously linked with pro-democratic attitudes, political dis-engagement, and autocratic failure. At the same time, autocrats are predicted to be hesitant towards investing in disenfranchised populations. However, education has also been found to bolster national loyalty, human capital, and long-term development. Nor is the real-world variation clear, autocratic states display significant variation in educational investment and attainment in addition to varied relationships between education and political participation. I argue that education can sustain or compromise autocratic stability depending on two factors: the ethnic composition of the state and the extent to which the state uses propaganda in schools. Education does not have a uniform effect. Education will not instill similarly pro-democratic attitudes across a diverse population - even if the education “treatment” is constant. At the state level, similar educational policies and initiatives across autocratic states can have opposite outcomes - jeopardizing or strengthening autocratic stability. Similarly, at the individual level, increased education can lead to individuals becoming supportive of or opposed to the autocratic state. My dissertation seeks to investigate the factors leading to differences in this real world variation.
My research currently focuses on three inter-related questions:
The following dataset creation is aimed at Question #1 and #2.
When does education lead to democratization and autocratic failure? Extensive literature highlights the relationship between education and democratization (Lipset, 1959; Lerner, 1958; Benavot, 1996; Glaeser et al., 2007; Sanborn and Thyne, 2014). My dissertation argues that investigating ethnic politics changes our understanding of this relationship. My theory highlights the risks and rewards of increasing educational attainment in autocratic states, conditional upon pre-existing ethno-political power relations. My primary causal mechanism suggests that increasingly educated and marginalized groups will foster stronger pro-democratic attitudes than similarly educated and advantaged groups, thereby education is more likely to lead to democratization when extended to marginalized groups.
However, all previous literature on education and democratization focuses on the state level and no dataset exists that allows for testing sub-national dynamics. In other words, no information is currently readily available that accounts for ethnic groups’ educational attainment across time/space.
To test sub-national variation, I provide a preliminary and novel dataset on ethnic group educational attainment over time. Ideally, this will ultimately allow for investigating how sub-national variation in educational attainment impacts likelihood of democratization.
The following is an rmark document that highlights the creation of the Ethnic Group Education dataset (EGE). The EGE is a dataset that will include all major ethnic groups per country-year (1969-2015) and their educational attainment. No such dataset/information readily exists. As a preliminary first cut at such a dataset and proof of concept, I first construct the EGE for 35 countries in Africa.
In short, the dataset is constructed by:
The end result is a dataset that contains country-year information on every major ethnic group in 35 African countries and their corresponding:
The following highlights the construction of the dataset, and then provides some preliminary figures/information using the dataset.
library(countrycode)
library(ggplot2)
library(foreign)
library(directlabels)
library(tidyr)
library(dplyr)
library(reshape2)
library(stargazer)
library(multiwayvcov)
library(miceadds)
library(jtools)
library(readxl)
library(plyr)
library(haven)
library(stringr)
library(LEDA)
library(gridExtra)
library(sjlabelled)
setwd("~/Google Drive/Ohio State/Dissertation/Ethnic Group Dataset") #macbook
ab4 <- read_sav("merged_r4_data.sav")
ab5 <- read_sav("merged-round-5-data-34-countries-2011-2013-last-update-july-2015.sav")
ab6 <- read_sav("merged_r6_data_2016_36countries2.sav")First, Afrobarometer numbers their countries differently each round. Therefore, I want to get their country names in the data so I can use their COW Country Codes. For each round, I have a corresponding excel document that lists the country name and Afrobarometer value. I then can use the Country Code package to standardize the values.
ab.ccodes.r4 <- read_excel("ab_country_codes_r4.xlsx")
#head(ab.ccodes.r4) ## Note: these are the country values *as defined in Afrobarometer Wave 4*
ab.ccodes.r4$ccode <- countrycode(ab.ccodes.r4$Statename, "country.name", "cown") # this package takes the coutnry name and puts in the correlates of war country code
ab4 <- join(ab.ccodes.r4, ab4, by ="COUNTRY")We’ll keep the following information from the Afrobarometer Wave 4:
## Age
# Question Number: Q1
# Question: How old are you?
# Variable Label: Q1. Age
# Values: 18-110, 998-999, -1
# Value Labels: 998=Refused to answer, 999=Don't know, -1=Missing
ab4$age <- ab4$Q1
ab4$age <- as.numeric(as.character(ab4$age))
ab4$age[ab4$age == -1] <- NA
ab4$age[ab4$age == 998] <- NA
ab4$age[ab4$age == 999] <- NA
#table(ab4$age)
## Education
# Question Number: Q89
# Question: What is the highest level of education you have completed?
# Variable Label: Education of respondent
# Values: 0-9, 99, 998 -1
# Value Labels: 0=No formal schooling, 1=Informal schooling only (including Koranic schooling), 2=Some primary schooling, 3=Primary school completed, 4=Some secondary school/ high school, 5=Secondary school completed/high school completed, 6=Post-secondary qualifications, other than university e.g. a diploma or degree from polytechnic or college, 7=Some university, 8=University completed, 9=Post-graduate, 99=Don’t know, 998=Refused to answer, -1=Missing data
ab4$edu <- as.numeric(ab4$Q89)
ab4$edu[ab4$edu == -1] <- NA
ab4$edu[ab4$edu == 99] <- NA
#table(ab4$edu)
ab4$primary <- ifelse(ab4$edu >= 3, 1, 0)
ab4$secondary <- ifelse(ab4$edu >= 5, 1, 0)
ab4$tertiary <- ifelse(ab4$edu >= 8, 1, 0)
## Language
# Question Number: Q3
# Question: Which [country] language is your home language?
# Variable Label: Language of Respondent
# Values: See codebook
# Value Labebls: See codebook
ab4$language <- ab4$Q3
## Survey Year
# Question: Date of interview
# Variable Label: Date of interview
# Values: 04.03.08 – 31.12.08
# table(ab4$DATEINTR)
# Despite the codebook saying the values are only in 2008, the table indicates that some respondents were interviewed into 2009.
# Therefore, I'll create a new variable that takes the first 4 digits/integers of the DATEINTR variable.
ab4$year <- ab4$DATEINTR
ab4$year <- as.character(ab4$year)
ab4$year <- str_sub(ab4$year, 1, 4)
ab4$year <- as.numeric(ab4$year)
table(ab4$year)##
## 2008 2009
## 25305 2408
## Ethnic vs. National Identity
# Question Number: Q83
# Question: Let us suppose that you had to choose between being a [Ghanaian/Kenyan/etc.] and being a ________ [R’s Ethnic Group]. Which of the following best expresses your feelings?
# Variable Label: Ethnic or national identity
# Values: 1-5, 7, 9, 998, -1
# Value Labels: 1=I feel only (R’s ethnic group), 2=I feel more (R’s ethnic group) than [Ghanaian/Kenyan/etc.], 3=I feel equally [Ghanaian/Kenyan/etc.] and (R’s ethnic group), 4=I feel more [Ghanaian/Kenyan/etc.] than (R’s ethnic group), 5=I feel only [Ghanaian/Kenyan/etc.], 7=Not applicable, 9=Don’t know, 998=Refused to answer, - 1=Missing data
ab4$identity <- ab4$Q83
ab4$identity[ab4$identity == -1] <- NA
ab4$identity[ab4$identity == 7] <- NA
ab4$identity[ab4$identity == 9] <- NA
# Urban vs. Rural
## Question Number: URBRUR
## Question: PSU/EA
## Variable Label: Urban or Rural Primary Sampling Unit Values: 1-2
## Value Labels: 1=urban, 2=rural
## Note: Answered by interviewer
ab4$rural <- ab4$URBRUR
ab4$rural <- ab4$rural - 1
#1: rural, 0: urban
# Sex
# Question Number: THISINT
# Question: This interview must be with a: Variable Label: This interview, gender Values: 1, 2
# Value Labels: 1=Male, 2=Female
# Note: Answered by interviewer
ab4$female <- ab4$THISINT
ab4$female <- ab4$female-1
#1: female, 0 : male
# Employed
# Question Number: Q94
# Question: Do you have a job that pays a cash income? Is it full-time or part-time? And are you presently looking for a job (even if you are presently working)?
# Variable Label: Employment status
# Values: 0-5, 9, 998, -1
# Value Labels: 0=No (not looking), 1=No (looking), 2=Yes, part time (not looking), 3=Yes, part time (looking), 4=Yes, full time (not looking), 5=Yes, full time (looking), 9=Don’t know, 998=Refused to answer, -1=Missing data Source: SAB
ab4$employment <- ab4$Q94
ab4$employment[ab4$employment == -1] <- NA
ab4$employment[ab4$employment == 9] <- NA
ab4$employed <- ifelse(ab4$employment > 1, 1, 0)
# View on Democracy
# Question Number: Q30
# Question: Which of these three statements is closest to your own opinion?
# Statement 1: Democracy is preferable to any other kind of government.
# Statement 2: In some circumstances, a non-democratic government can be preferable.
# Statement 3: For someone like me, it doesn’t matter what kind of government we have.
# Variable Label: Support for democracy
# Values: 1-3, 9, 998, -1
# Value Labels: 1=Statement 3: Doesn’t matter, 2=Statement 2: Sometimes non-democratic preferable, 3=Statement 1: Democracy preferable, 9=Don’t know, 998=Refused to answer, -1=Missing data
#table(ab4$Q30)
ab4$democracy <- ab4$Q30
ab4$democracy[ab4$democracy == -1] <- NA
ab4$democracy[ab4$democracy == 9] <- NA
#table(ab4$democracy)
# Extent of Democracy in [Country]
# Question Number: Q42A
# Question: In your opinion how much of a democracy is [Ghana/Kenya/etc.]? today?
# Variable Label: Extent of democracy
# Values: 1-4, 8, 9, 998, -1
# Value Labels: 1=Not a democracy, 2=A democracy, with major problems, 3=A democracy, but with minor problems, 4=A full democracy, 8=Do not understand question/ do not understand what ‘democracy’ is, 9=Don’t know, 998=Refused to answer, -1=Missing data
# Source: Ghana 97
# table(ab4$Q42A)
ab4$democracyInCountry <- ab4$Q42A
ab4$democracyInCountry[ab4$democracyInCountry == -1] <- NA
ab4$democracyInCountry[ab4$democracyInCountry == 8] <- NA
ab4$democracyInCountry[ab4$democracyInCountry == 9] <- NA
# table(ab4$democracyInCountry)
# Satisfied w/ Democracy in [Country]
# Question Number: Q43
# Question: Overall, how satisfied are you with the way democracy works in [Ghana/Kenya/etc.]? Are you: Variable Label: Satisfaction with democracy
# Values: 0-4, 9, 998, -1
# Value Labels: 0=My country is not a democracy, 1=Not at all satisfied, 2=Not very satisfied, 3=Fairly satisfied, 4=Very satisfied, 9=Don’t know, 998=Refused to answer, -1=Missing data
# Source: Eurobarometer
#table(ab4$Q43)
ab4$satisfiedDemInCountry <- ab4$Q43
ab4$satisfiedDemInCountry[ab4$satisfiedDemInCountry == -1] <- NA
ab4$satisfiedDemInCountry[ab4$satisfiedDemInCountry == 9] <- NA
#table(ab4$satisfiedDemInCountry)
# Trust in President
# Question Number: Q49A
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: The President?
# Variable Label: Trust president
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Zambia96
# Note: “Prime Minister” in Lesotho; “President” and “Prime Minister” in Burkina Faso, Cape Verde, Madagascar, Mali, Mozambique, Namibia, Senegal and Zimbabwe; “President” in Benin, Botswana, Ghana, Kenya, Liberia, Malawi, Nigeria, South Africa, Tanzania, Uganda, and Zambia.
ab4$trustPresident <- ab4$Q49A
ab4$trustPresident[ab4$trustPresident == -1] <- NA
ab4$trustPresident[ab4$trustPresident == 9] <- NA
# table(ab4$trustPresident)
# Trust in Parliament
# Question Number: Q49B
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: Parliament?
# Variable Label: Trust parliament/national assembly
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Adapted from Zambia96
# Note: “National Assembly” in Benin, Burkina Faso, Cape Verde, Liberia, Madagascar, Malawi, Mali, Mozambique, Nigeria, Tanzania, Uganda, Zambia; “Parliament” in Botswana, Ghana, Kenya, Lesotho, Namibia, Senegal, ans South Africa; “House of Assembly” in Zimbabwe.
#table(ab4$Q49B)
ab4$trustParliament <- ab4$Q49B
ab4$trustParliament[ab4$trustParliament == -1] <- NA
ab4$trustParliament[ab4$trustParliament == 9] <- NA
#table(ab4$trustParliament)
# Trust in Ruling Party
# Question Number: Q49E
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: The Ruling Party?
# Variable Label: Trust the ruling party
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Adapted from Zambia96
#table(ab4$Q49E)
ab4$trustRP <- ab4$Q49E
ab4$trustRP[ab4$trustRP == -1] <- NA
ab4$trustRP[ab4$trustRP == 9] <- NA
#table(ab4$trustRP)
# Trust Traditional Leaders
# Question Number: Q49I
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: Traditional leaders
# Variable Label: Trust traditional leaders
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Zambia 96
#table(ab4$Q49I)
#ab4$trustTL <- ab4$Q49I
#ab4$trustTL[ab4$trustTL == -1] <- NA
#ab4$trustTL[ab4$trustTL == 7] <- NA
#ab4$trustTL[ab4$trustTL == 9] <- NA
#table(ab4$trustTL)
# Ethnic Group Treated Unfairly
# Question Number: Q82
# Question: How often are ___________s [R’s Ethnic Group] treated unfairly by the government?
# Variable Label: Ethnic group treated unfairly
# Values: 0-3, 7, 9, 998, -1
# Value Labels: 0=Never, 1=Sometimes, 2=Often, 3=Always, 7=Not applicable, 9=Don’t know, 998=Refused to answer, -1=Missing data
# Source: SAB
# Note: Interviewer probed for strength of opinion. If respondent did not identify any group on this question – that is, if they “Refused to answer” (998), said “Don’t know” (999), or “Ghanaian only” (990) – then the interviewer marked “Not applicable” for questions 80-83 and continued to question 84.
#table(ab4$Q82)
ab4$treatedUnfairly <- ab4$Q82
ab4$treatedUnfairly[ab4$treatedUnfairly == -1] <- NA
ab4$treatedUnfairly[ab4$treatedUnfairly == 7] <- NA
ab4$treatedUnfairly[ab4$treatedUnfairly == 9] <- NA
#table(ab4$treatedUnfairly)#######
myvars <- c("COUNTRY", "Statename", "ccode", "RESPNO", "age",
"edu", "primary", "secondary", "tertiary", "language", "year", "identity", "rural", "female", "employment", "employed", "democracy",
"democracyInCountry", "satisfiedDemInCountry", "trustPresident", "trustParliament", "trustRP", "treatedUnfairly")
ab4 <- ab4[myvars]
head(ab4)## COUNTRY Statename ccode RESPNO age edu primary secondary tertiary language
## 1 1 Benin 434 BEN0001 38 4 1 0 0 100
## 2 1 Benin 434 BEN0002 46 2 0 0 0 104
## 3 1 Benin 434 BEN0003 28 4 1 0 0 101
## 4 1 Benin 434 BEN0004 30 3 1 0 0 100
## 5 1 Benin 434 BEN0005 23 4 1 0 0 100
## 6 1 Benin 434 BEN0006 24 4 1 0 0 100
## year identity rural female employment employed democracy democracyInCountry
## 1 2008 2 0 1 0 0 3 4
## 2 2008 4 0 0 1 0 3 4
## 3 2008 NA 0 1 2 1 3 4
## 4 2008 5 0 0 1 0 2 3
## 5 2008 5 0 1 1 0 3 2
## 6 2008 5 0 0 1 0 2 3
## satisfiedDemInCountry trustPresident trustParliament trustRP treatedUnfairly
## 1 4 3 1 1 0
## 2 4 1 1 0 0
## 3 2 3 2 2 NA
## 4 2 1 1 0 0
## 5 2 3 3 1 0
## 6 3 1 2 2 1
Now I want to do the same for Round 5 and Round 6 of Afrobarometer. I do not replicate the code below, but it is otherwise identical in execution to the code for Round 4.
myvars <- c("COUNTRY", "Statename", "ccode", "RESPNO", "age",
"edu", "primary", "secondary", "tertiary", "language", "year", "identity", "rural", "female", "employment", "employed", "democracy",
"democracyInCountry", "satisfiedDemInCountry", "trustPresident", "trustParliament", "trustRP", "treatedUnfairly")
ab5 <- ab5[myvars]
head(ab5)## COUNTRY Statename ccode RESPNO age edu primary secondary tertiary language
## 1 1 Algeria 615 ALG0001 48 5 1 1 0 5
## 2 1 Algeria 615 ALG0002 36 5 1 1 0 2
## 3 1 Algeria 615 ALG0003 34 5 1 1 0 5
## 4 1 Algeria 615 ALG0004 23 6 1 1 0 5
## 5 1 Algeria 615 ALG0005 41 2 0 0 0 5
## 6 1 Algeria 615 ALG0006 38 4 1 0 0 5
## year identity rural female employment employed democracy democracyInCountry
## 1 2013 NA 1 0 3 1 2 1
## 2 2013 NA 1 1 3 1 3 1
## 3 2013 NA 1 0 3 1 3 2
## 4 2013 NA 1 1 3 1 2 3
## 5 2013 NA 1 0 3 1 3 3
## 6 2013 NA 1 1 0 0 3 NA
## satisfiedDemInCountry trustPresident trustParliament trustRP treatedUnfairly
## 1 1 3 1 0 NA
## 2 2 2 1 1 NA
## 3 3 2 2 1 NA
## 4 NA 2 1 0 NA
## 5 3 2 0 0 NA
## 6 NA 2 NA NA NA
myvars <- c("COUNTRY", "Statename", "ccode", "RESPNO", "age",
"edu", "primary", "secondary", "tertiary", "language", "year", "identity", "rural", "female", "employment", "employed", "democracy",
"democracyInCountry", "satisfiedDemInCountry", "trustPresident", "trustParliament", "trustRP", "treatedUnfairly")
ab6 <- ab6[myvars]
head(ab6)## COUNTRY Statename ccode RESPNO age edu primary secondary tertiary language
## 1 1 Algeria 615 ALG0001 27 2 0 0 0 1420
## 2 1 Algeria 615 ALG0002 30 2 0 0 0 1420
## 3 1 Algeria 615 ALG0003 62 8 1 1 1 1420
## 4 1 Algeria 615 ALG0004 30 5 1 1 0 1420
## 5 1 Algeria 615 ALG0005 35 4 1 0 0 1420
## 6 1 Algeria 615 ALG0006 21 7 1 1 0 1420
## year identity rural female employment employed democracy democracyInCountry
## 1 2015 NA 1 0 3 1 1 NA
## 2 2015 2 1 1 0 0 3 1
## 3 2015 NA 1 0 0 0 3 2
## 4 2015 NA 1 1 3 1 3 3
## 5 2015 NA 1 0 3 1 2 2
## 6 2015 NA 1 1 1 0 3 2
## satisfiedDemInCountry trustPresident trustParliament trustRP treatedUnfairly
## 1 NA 3 2 2 NA
## 2 3 1 1 1 0
## 3 3 1 1 1 0
## 4 3 2 1 2 NA
## 5 3 2 0 1 NA
## 6 3 2 1 1 NA
Each of the Afrobarometer (rounds 4-6) are now identical in that they each have the following variables:
Given they have the same order of the columns as well, I could rbind() them; however, the ``RESPNO’’ variables will then be duplicated. Therefore, the first thing I do is add the year to each of the RESPNO.
ab4$RESPNO <- as.character(ab4$RESPNO)
ab4$RESPNO <- str_c(ab4$RESPNO, "-", ab4$year)
ab5$RESPNO <- as.character(ab5$RESPNO)
ab5$RESPNO <- str_c(ab5$RESPNO, "-", ab5$year)
ab6$RESPNO <- as.character(ab6$RESPNO)
ab6$RESPNO <- str_c(ab6$RESPNO, "-", ab6$year)
head(ab4$RESPNO) # Example## [1] "BEN0001-2008" "BEN0002-2008" "BEN0003-2008" "BEN0004-2008" "BEN0005-2008"
## [6] "BEN0006-2008"
Now, I want to use the Linking Ethnic Data in Africa Dataset package to use the language of each respondent as an indicator of Ethnicity, which I can then link to other datasets (such as EPR). The Ethnic Power Relations dataset includes country-year information on ethnic groups and their relative political status (Monopoly, Dominant, Senior Partner, Junior Partner, Powerless, Discrimianted, Irrelelvant).
LEDA() lets me produce a dataset that includes the language name from Afrobarometer and it’s corresponding ethnic group from EPR. Here’s an example.
leda <- LEDA$new()
# Retrieve dataset dictionary
list.dict <- leda$get_list_dict()
## Link all Afrobarometer groups to EPR data for round 4
setlink.ab4 <- leda$link_set(lists.a = list(type = c("Afrobarometer"),
round = 4, marker = "language"),
lists.b = list(type = c("EPR")),
link.level = "dialect",
by.country = T,
drop.a.threshold = 0,
drop.b.threshold = 0,
drop.ethno.id = F)
#Subsetting year
setlink.ab4 <- setlink.ab4[which(setlink.ab4$b.year==2008),]
# Now for Round 5
setlink.ab5 <- leda$link_set(lists.a = list(type = c("Afrobarometer"),
round = 5, marker = "language"),
lists.b = list(type = c("EPR")),
link.level = "dialect",
by.country = T,
drop.a.threshold = 0,
drop.b.threshold = 0,
drop.ethno.id = F)
#Subsetting year
setlink.ab5<- setlink.ab5[which(setlink.ab5$b.year==2011),]
## Now for Round 6
setlink.ab6 <- leda$link_set(lists.a = list(type = c("Afrobarometer"),
round = 6, marker = "language"),
lists.b = list(type = c("EPR")),
link.level = "dialect",
by.country = T,
drop.a.threshold = 0,
drop.b.threshold = 0,
drop.ethno.id = F)
#Subsetting year
setlink.ab6 <- setlink.ab6[which(setlink.ab6$b.year==2015),]
## Have a look
head(setlink.ab4[, c("a.group", "b.group", "a.type", "b.type")])## a.group b.group a.type b.type
## 39 Adja Southwestern (Adja) Afrobarometer EPR
## 97 Adja Southwestern (Adja) Afrobarometer EPR
## 166 Adja Southeastern (Yoruba/Nagot and Goun) Afrobarometer EPR
## 213 Adja Southwestern (Adja) Afrobarometer EPR
## 282 Adja Southeastern (Yoruba/Nagot and Goun) Afrobarometer EPR
## 329 Adja Southwestern (Adja) Afrobarometer EPR
Next, I need to load in the corresponding ``Language ID Number’’ and ‘’Language Name’’ from Afrobarometer. The following excel documents were created using the Codebooks from Afrobaromter. In short, I would copy and paste the delimmeted list from the codebooks of the ID=language and use excel to automatically make them into individual rows/columns.
lang.r4 <- read_excel("languages_r4.xlsx")
lang.r5 <- read_excel("languages_r5.xlsx")
lang.r6 <- read_excel("languages_r6.xlsx")
# Each of these documents contain a "language" which corresponds to the language ID numbers from the Afrobarometer language question.
# Each of these documents also contain a corresponding "languageName" which correspodns to the name of the language for each ID from the Afrobarometer codebook.Let’s run a few checks to see how much the language IDs from the coodebooks and LEDA() match.
link.ab4 <- setlink.ab4[, c("a.cowcode", "a.iso3c", "a.group", "b.group", "a.type", "b.type")]
link.ab4 <- link.ab4[!duplicated(link.ab4), ]
setdiff(link.ab4$a.group, lang.r4$languageName)## [1] "Fuls"
## [2] "Moore"
## [3] "Senoufo"
## [4] "Arabe"
## [5] "Khassonke"
## [6] "Malinke"
## [7] "Soninke/ Sarakoll"
## [8] "Sonrhai"
## [9] "Mang'anja"
## [10] "Oshiwambo"
## [11] "Ijaw/Kalabari/Okirika/Andoni/Ogoni/Nembe"
link.ab5 <- setlink.ab5[, c("a.cowcode", "a.iso3c", "a.group", "b.group", "a.type", "b.type")]
link.ab5 <- link.ab5[!duplicated(link.ab5), ]
setdiff(link.ab5$a.group, lang.r5$languageName)## [1] "Moore" "Senoufo"
## [3] "Baoule" "Bete"
## [5] "Godie" "Guere"
## [7] "Diakanke" "konianke"
## [9] "Maasai / Samburu" "Meru / Embu"
## [11] "\"Official\" Malagasy" "Khassonke"
## [13] "Malinke" "Peulh / Fulfude"
## [15] "Soninke / Sarakolle" "Sonrhai"
## [17] "Chimang'anja" "Oshiwambo (Oshindonga/Oshikwanyama)"
## [19] "Beri beri" "Zarrma/Songhai"
## [21] "Kabye"
link.ab6 <- setlink.ab6[, c("a.cowcode", "a.iso3c", "a.group", "b.group", "a.type", "b.type")]
link.ab6 <- link.ab6[!duplicated(link.ab6), ]
setdiff(link.ab6$a.group, lang.r6$languageName)## [1] "Moore" "Senoufo"
## [3] "Baoule" "Bete"
## [5] "Godie" "Guere"
## [7] "Bangangte" "Foufoulde"
## [9] "Mbede" "Myene"
## [11] "Nzebi/Metie" "Punu/Merie"
## [13] "Malgache << officiel >>" "Malgache avec specificite regionale"
## [15] "Khassonke" "Malinke"
## [17] "Soninke/Sarakole" "Portuguese"
## [19] "Zarma/Songhai"
So there is some mis-match, but not much, across each list. Therefore, I change/fix the spelling of any languages that are obvious matches - i.e. those with just a one letter difference (which I corroborated to be abn alternative spelling online), or difference in accent mark (which LEDA() does not include in any spelling), etc. I do this to match the list of languages as they are spelled in the LEDA() function. Therefore, I fix the spelling as it is in my excel documents.
lang.r4 <- read_excel("languages_r4_sp_fixed.xlsx")
setdiff(link.ab4$a.group, lang.r4$languageName) # Remaining mis-matches## [1] "Arabe"
## [2] "Senufo/ Mianka"
## [3] "Soninke/ Sarakoll"
## [4] "Ijaw/Kalabari/Okirika/Andoni/Ogoni/Nembe"
lang.r5 <- read_excel("languages_r5_sp_fixed.xlsx")
setdiff(link.ab5$a.group, lang.r5$languageName) # Remaining mis-matches## [1] "\"Official\" Malagasy" "Senufo"
lang.r6 <- read_excel("languages_r6_sp_fixed.xlsx")
setdiff(link.ab6$a.group, lang.r6$languageName) # Remaining mis-matches## [1] "Senufo"
So at this point I have three datasets for each Afrobarometer round:
## COUNTRY Statename ccode RESPNO age edu primary secondary tertiary
## 1 1 Benin 434 BEN0001-2008 38 4 1 0 0
## 2 1 Benin 434 BEN0002-2008 46 2 0 0 0
## 3 1 Benin 434 BEN0003-2008 28 4 1 0 0
## 4 1 Benin 434 BEN0004-2008 30 3 1 0 0
## 5 1 Benin 434 BEN0005-2008 23 4 1 0 0
## 6 1 Benin 434 BEN0006-2008 24 4 1 0 0
## language year identity rural female employment employed democracy
## 1 100 2008 2 0 1 0 0 3
## 2 104 2008 4 0 0 1 0 3
## 3 101 2008 NA 0 1 2 1 3
## 4 100 2008 5 0 0 1 0 2
## 5 100 2008 5 0 1 1 0 3
## 6 100 2008 5 0 0 1 0 2
## democracyInCountry satisfiedDemInCountry trustPresident trustParliament
## 1 4 4 3 1
## 2 4 4 1 1
## 3 4 2 3 2
## 4 3 2 1 1
## 5 2 2 3 3
## 6 3 3 1 2
## trustRP treatedUnfairly
## 1 1 0
## 2 0 0
## 3 2 NA
## 4 0 0
## 5 1 0
## 6 2 1
## # A tibble: 6 x 2
## language languageName
## <dbl> <chr>
## 1 1 English
## 2 2 French
## 3 3 Portuguese
## 4 4 Kiswahili
## 5 100 Fon
## 6 101 Adja
## a.cowcode a.iso3c a.group
## 39 434 BEN Adja
## 166 434 BEN Adja
## 1210 434 BEN Bariba
## 1268 434 BEN Dendi
## 1333 434 BEN Fon
## 1630 434 BEN Goun
## b.group
## 39 Southwestern (Adja)
## 166 Southeastern (Yoruba/Nagot and Goun)
## 1210 Northern (Bariba, Peul, Ottamari, Yoa-Lokpa, Dendi, Gourmanchema)
## 1268 Northern (Bariba, Peul, Ottamari, Yoa-Lokpa, Dendi, Gourmanchema)
## 1333 South/Central (Fon)
## 1630 Southeastern (Yoruba/Nagot and Goun)
## a.type b.type
## 39 Afrobarometer EPR
## 166 Afrobarometer EPR
## 1210 Afrobarometer EPR
## 1268 Afrobarometer EPR
## 1333 Afrobarometer EPR
## 1630 Afrobarometer EPR
colnames(link.ab4)[1] <- "ccode"
colnames(link.ab4)[2] <- "StatenameCond"
colnames(link.ab4)[3] <- "languageName"
colnames(link.ab4)[4] <- "group"
ab4 <- join(ab4, lang.r4, by = "language")Now I want to merge EPR with the link.ab# list.
# So there are several iterations of aggregating EPR.
# Let's first look at their structure of it.
EPR <- read.csv("EPR-2014.csv")
head(EPR)## gwid statename from to group groupid gwgroupid umbrella
## 1 2 United States 1946 1965 Whites 1000 201000 NA
## 2 2 United States 1946 1965 Latinos 2000 202000 NA
## 3 2 United States 1946 1965 African Americans 3000 203000 NA
## 4 2 United States 1946 1965 Asian Americans 4000 204000 NA
## 5 2 United States 1946 1965 American Indians 5000 205000 NA
## 6 2 United States 1946 1965 Arab Americans 6000 206000 NA
## size status reg_aut
## 1 0.6910 MONOPOLY
## 2 0.1250 IRRELEVANT
## 3 0.1240 DISCRIMINATED false
## 4 0.0360 IRRELEVANT
## 5 0.0078 POWERLESS true
## 6 0.0042 IRRELEVANT
# As you can see, it's a condsensed year format - so I expand it so I can subset it.
EPR$year <- mapply(seq, EPR$from, EPR$to, SIMPLIFY=FALSE)
EPR <- EPR %>%
unnest(year) %>%
select(-from,-to)
#Subsetting year, lets do just 2008 for AB4 as of now
EPR <- EPR[which(EPR$year==2008),]
EPR$ccode <- countrycode(EPR$statename, "country.name", "cown")## Warning in countrycode(EPR$statename, "country.name", "cown"): Some values were not matched unambiguously: Serbia
## Subset just countries in AB4.
#table(ab4$ccode)
ab4ccode <- as.data.frame(table(ab4$ccode))
ab4ccode <- as.vector(ab4ccode$Var1)
#head(ab4ccode) #list of country codes (COW) in Ab4
EPR <- subset(EPR, EPR$ccode %in% ab4ccode)
#table(EPR$ccode)
link.ab4 <- merge(link.ab4, EPR, by = c("ccode", "group"))
# I think at this point I need to print this out via excel, or smoething, and manually delete doubles that don't DIFFER in information.
# Specifically, i'm looking for doubles of "languageName" that otherwise contain the same relevant information from EPR.
write.csv(link.ab4, file = "link_ab4.csv")So at this point, given how I am linking languages using LEDA() - some languages in Afrobarometer may correspond to multiple ethnic groups in the same country. In other words, 1 language may correspond to multiple ethnic groups, and each ethnic group may have different statuses. Therefore, I adopt the following coding rules.
In the picture below, you can see that the language “Adja” (from the Afrobarometer Wave 4 language repsonse) corresponds to two ethnic groups in Benin - groups in the Southwest and Southeast. The same applies for the language “Goun”. In this case - regardless of the language*ethnic group, their “status” does not change. In otherwords, all individuals who speak “Adja” (in the SW or SE) are “Junior Partners” and all individuals who speak “Goun” are “Junior Partners”. In such cases, I simply collapse the observations (or remove 1 from each, so that I don’t get duplicates when merging).
Example 1 - Excel Screencap
The next example gives us two other possibilities. In the first case (light green) the language “Akan” is linked to two ethnic groups in Ghana - the Asanta (Akan) and the “Other Akans”. In this case, the Asante are “Senior Partner” and Other Akans are “Junior Partner”. In such cases, I prioritze whether or not they have power (i.e., I don’t intend to differentiate between senior/junior partner). Therefore, I delete the “Other Akan” group.
Alternatively, the language “Ijaw/Kalabari/Okirika/Andoni/Ogoni/Nembe” (dark greeen) in Nigeria applies to two ethnic groups, the Ijaw and Ogoni; however, the Ijaw are “Junior Partner” and the Ogoni are “Powerless”. Therefore, I delete both as I am unable to differeniate them in the analysis.
Example 2 - Excel Screencap
By and large, the majority of countries had no duplicates. Of those that did have duplicates, it was only 1-2 groups. The exception was Namibia - in which nearly all languages coincided with multiple ethnic groups that ultimately had different statuses. I still followed the above rules.
Example 3 - Excel Screencap
link.ab4.fixed <- read.csv("link_ab4_fixed.csv")
link.ab4.fixed$X <- NULL
ab4 <- join(ab4, link.ab4.fixed, by = c("ccode", "languageName"))So at this point, I’ve fully merged the Afrobarometer Wave 4 dataset with EPR, where individuals were connected to ethnic groups based upon their language, where dialect linked individuals to certain ethnic groups as defined in EPR. Let’s take a look at how many individuals now in Afrobarometer Wave 4 have corresponding information in EPR.
## [1] 27713
## [1] 8481
## [1] 19232
## [1] 0.6939703
Nearly 70% of respondents in Afrobarometer Wave 4 have a corresponding EPR group. The missing 30% likely exists, but I had to drop the information because languages corresponding to ethnic groups with conflicting statuses. I do hope to return to this in the future - but for now I move forward as a proof of concept.
At this point, I want to repeat the above steps (beginning with loading EPR) two more times - one for each year of Afrobarometer Wave 5 (2011) and Afrobarometer Wave 6 (2015).
I repeat the above steps (beginning with “Loading EPR”) for AB5 and AB6, but I do not replicate it below.
As with above, there are similar problems where 1 language coincides with multiple ethnic groups. This is particularly true in North Africa, as shown below.
In the case of Morocco, Arabic coincides with two ethnic groups - Arabs and Saharwis. Arabs are dominant, and Sahrawis are discrimianted. However, Arabic-speaking Saharwis only make up .016 of the population.See similar issues Arabic being the sole-language in Sudan and Egypt.
Therefore, the only change i really make is that if there are multiple ethinc groups to a single language, and one of the ethinc groups is less than .1 percent of the population (often much smaller) and is powerless, I delete that group in favor of the ethnic group that is much larger and has power. This is to better capture scenarios where very small marginalized ethnic groups (who likely are not even picked up by afrobarometer surveys) speak the same language as larger empowered groups. Therefore, in the above, I delet the information from row 277 and 278.
At this point in time, I have the Afrobarometer Wave 4-6 surveys merged with EPR. For roughly 70% of all respondents, I have their corresponding Ethnic Group Status information (which is not included in Afrobarometer).
myvars <- c("COUNTRY", "Statename", "ccode", "RESPNO", "age",
"edu", "primary", "secondary", "tertiary", "language", "year", "identity", "rural", "female", "employment", "employed", "democracy", "democracyInCountry", "satisfiedDemInCountry", "trustPresident", "trustParliament", "trustRP", "treatedUnfairly", "languageName", "group", "size", "status")
ab4 <- ab4[myvars]
ab5 <- ab5[myvars]
ab6 <- ab6[myvars]
save(ab4, file = "ab4_final.Rda")
save(ab5, file = "ab5_final.Rda")
save(ab6, file = "ab6_final.Rda")
aball <- bind_rows(ab4,ab5)
aball.new <- bind_rows(aball,ab6)
save(aball.new, file = "ab_all_final_new.Rda")
head(aball)## COUNTRY Statename ccode RESPNO age edu primary secondary tertiary
## 1 1 Benin 434 BEN0001-2008 38 4 1 0 0
## 2 1 Benin 434 BEN0002-2008 46 2 0 0 0
## 3 1 Benin 434 BEN0003-2008 28 4 1 0 0
## 4 1 Benin 434 BEN0004-2008 30 3 1 0 0
## 5 1 Benin 434 BEN0005-2008 23 4 1 0 0
## 6 1 Benin 434 BEN0006-2008 24 4 1 0 0
## language year identity rural female employment employed democracy
## 1 100 2008 2 0 1 0 0 3
## 2 104 2008 4 0 0 1 0 3
## 3 101 2008 NA 0 1 2 1 3
## 4 100 2008 5 0 0 1 0 2
## 5 100 2008 5 0 1 1 0 3
## 6 100 2008 5 0 0 1 0 2
## democracyInCountry satisfiedDemInCountry trustPresident trustParliament
## 1 4 4 3 1
## 2 4 4 1 1
## 3 4 2 3 2
## 4 3 2 1 1
## 5 2 2 3 3
## 6 3 3 1 2
## trustRP treatedUnfairly languageName group
## 1 1 0 Fon South/Central (Fon)
## 2 0 0 Yoruba Southeastern (Yoruba/Nagot and Goun)
## 3 2 NA Adja Southwestern (Adja)
## 4 0 0 Fon South/Central (Fon)
## 5 1 0 Fon South/Central (Fon)
## 6 2 1 Fon South/Central (Fon)
## size status
## 1 0.330 JUNIOR PARTNER
## 2 0.185 JUNIOR PARTNER
## 3 0.150 JUNIOR PARTNER
## 4 0.330 JUNIOR PARTNER
## 5 0.330 JUNIOR PARTNER
## 6 0.330 JUNIOR PARTNER
To note, the ``ab_all_final.Rda’’ dataset is the individual respondent information from Afrobarometer that can be used to tackle Question 1.
Please see the ``clott_egd_q1_rmark.Rmd’’ file for a preliminary data analysis for Question 1.
Let’s try to create an aggregate now. So “aball” contains three waves of Afrobarometer:
Therefore, I have a survey in which respondents were surveyed at different times, therefore their ages are not standardized - which means I need to fix this problem before backtracking age-cohort profiles. Therefore, I increase everyone’s age dependent upon when the survey was conducted. For instance, a 35 year old who was surveyed in 2008 would now (presumably, if alive) be 47. Of course this creates issues if someone was already elderly in 2008 (say ages 85+); however, we can keep them in the survey since we are assuming all education would be attained at a younger age. Therefore, keeping their observations helps bolster our estimates of earlier years. Furhtermore, I’m only standardizing from 2015 - the last year we have information.
From this information, my preliminary attempt at getting educational attainment rates per country group is to do the following:
This method of backtracking group-year information is based upon the assumption that education (at least primary and secondary) will be completed int he first 18 years of respondents’ life, on average. Tertiary education will still be captured by individuals older than 18.
aball$timeDiff <- 2015 - aball$year
aball$ageUpdate <- aball$age + aball$timeDiff
# Create at list the average education levels as they stand in 2015.
egeAll <-aggregate(aball$edu, by=list(Statename = aball$Statename, group = aball$group),
FUN=mean, na.rm = T)
egeAll <- as.data.frame(egeAll)
egeAll <- egeAll[with(egeAll, order(Statename, group)),]
colnames(egeAll)[3] <- "2015.1"
# I want to first include information on their EPR status before I put in all the education information.
myvars <- c("Statename", "group", "status")
abGroupStatus <- aball[myvars]
abGroupStatus <- unique(abGroupStatus)
#egeAll <- join(egeAll, abGroupStatus, by = c("Statename", "group"))
#egeAll$included <- ifelse(egeAll$status == "MONOPOLY" | egeAll$status == "DOMINANT" | egeAll$status == "SENIOR PARTNER" | egeAll$status == "JUNIOR PARTNER", 1, 0)
year <- 2015
# Here's my loop. Where i reflects ages 18-65.
for (i in 18:65) {
abTemp <- aball[ which(aball$ageUpdate >= i),]
table.temp <- aggregate(abTemp$edu, by=list(Statename = abTemp$Statename, group = abTemp$group),
FUN=mean, na.rm = T)
table.temp <- as.data.frame(table.temp)
table.temp <- table.temp[with(table.temp, order(Statename, group)),]
colnames(table.temp)[3] <- year
year <- year-1
egeAll <- merge(egeAll, table.temp, by = c("Statename", "group"), all = T)
}
egeAll$`2015.1` <- NULL
head(egeAll)## Statename group 2015 2014 2013
## 1 Algeria Arabs 3.195989 3.195989 3.195989
## 2 Algeria Berbers 2.436170 2.436170 2.436170
## 3 Benin South/Central (Fon) 2.376828 2.376828 2.376828
## 4 Benin Southeastern (Yoruba/Nagot and Goun) 2.083770 2.083770 2.083770
## 5 Benin Southwestern (Adja) 1.954301 1.954301 1.954301
## 6 Botswana Birwa 3.659574 3.659574 3.659574
## 2012 2011 2010 2009 2008 2007 2006 2005
## 1 3.182331 3.152953 3.113568 3.066116 3.030558 2.940331 2.856975 2.782822
## 2 2.436170 2.423913 2.423913 2.423913 2.247191 2.247191 2.204545 2.183908
## 3 2.376828 2.376828 2.360953 2.327945 2.330579 2.308523 2.265997 2.240260
## 4 2.083770 2.083770 2.066667 2.060109 2.033241 1.974212 1.917647 1.899696
## 5 1.954301 1.954301 1.929155 1.874302 1.882857 1.849275 1.781818 1.738854
## 6 3.659574 3.652174 3.613636 3.613636 3.613636 3.525000 3.512821 3.432432
## 2004 2003 2002 2001 2000 1999 1998 1997
## 1 2.754140 2.669355 2.609244 2.533432 2.472136 2.398374 2.352349 2.309187
## 2 2.116279 2.107143 2.096386 2.086420 2.075000 2.052632 1.945205 1.944444
## 3 2.214570 2.212587 2.173601 2.125926 2.057508 2.046053 2.038801 2.014679
## 4 1.880878 1.853821 1.783505 1.768421 1.728302 1.675676 1.690083 1.672489
## 5 1.720257 1.732203 1.696864 1.667857 1.661479 1.614458 1.534783 1.490909
## 6 3.388889 3.264706 3.187500 3.187500 2.931034 2.892857 2.892857 2.892857
## 1996 1995 1994 1993 1992 1991 1990 1989
## 1 2.225989 2.165029 2.126016 2.071429 2.053879 1.986333 1.940758 1.888614
## 2 1.943662 1.942857 1.850746 1.815385 1.825397 1.750000 1.750000 1.678571
## 3 2.009615 1.995918 1.983299 2.015982 1.959716 1.936275 1.964770 1.954155
## 4 1.656109 1.668246 1.691176 1.666667 1.620112 1.609195 1.607595 1.624161
## 5 1.421569 1.434555 1.430108 1.471264 1.429412 1.438272 1.430464 1.453237
## 6 2.892857 2.814815 2.730769 2.680000 2.434783 2.434783 2.285714 2.200000
## 1988 1987 1986 1985 1984 1983 1982 1981
## 1 1.864583 1.801105 1.747748 1.665605 1.651316 1.609589 1.521898 1.461847
## 2 1.588235 1.460000 1.372093 1.309524 1.292683 1.275000 1.135135 1.085714
## 3 1.938080 1.862745 1.864769 1.869231 1.888889 1.944206 1.909502 1.848341
## 4 1.657143 1.595420 1.507937 1.521368 1.531532 1.552381 1.572816 1.530612
## 5 1.477273 1.472000 1.394958 1.446429 1.452830 1.473684 1.373626 1.261905
## 6 1.823529 1.937500 1.937500 1.937500 1.733333 1.733333 1.733333 1.642857
## 1980 1979 1978 1977 1976 1975 1974 1973
## 1 1.365957 1.337719 1.273973 1.2537313 1.1516854 1.0952381 1.0931677 1.0764331
## 2 1.117647 1.090909 1.062500 0.8965517 0.6666667 0.5652174 0.5238095 0.5500000
## 3 1.855615 1.821229 1.844720 1.7054795 1.6250000 1.7142857 1.6886792 1.6739130
## 4 1.443038 1.361111 1.347826 1.3333333 1.2181818 1.1250000 0.8571429 0.8055556
## 5 1.209877 1.213333 1.349206 1.3500000 1.3508772 1.3653846 1.3600000 1.3478261
## 6 1.642857 1.769231 1.666667 1.6666667 1.3636364 1.3636364 1.3636364 1.3000000
## 1972 1971 1970 1969 1968
## 1 1.0206897 0.9777778 0.9047619 0.8403361 0.7619048
## 2 0.3888889 0.2941176 0.2500000 0.2142857 0.2307692
## 3 1.5662651 1.5500000 1.6086957 1.5606061 1.4918033
## 4 0.7352941 0.7575758 0.7419355 0.7419355 0.7666667
## 5 1.3333333 1.2857143 1.0810811 1.0882353 1.2692308
## 6 1.4444444 1.2500000 1.2500000 1.2500000 1.2500000
And voila! We have a dataset that has ethnic group education level per country year. Let’s melt it and add in the EPR information again. The trick is that the EPR information (whether or not a gruop was discrimianted/powerless/inpower etc. changes historically. So we actually want to merge this new dataset with the old EPR dataset. Therefore, we can say that “X Group was Discrimianted in 1975, and their education level was Y.”
Then we can look at a couple figures of what we have.
egeMelt <- melt(egeAll, id.vars = c("Statename", "group"))
colnames(egeMelt)[1] <- "statename"
colnames(egeMelt)[3] <- "year"
colnames(egeMelt)[4] <- "Edu"
head(egeMelt)## statename group year Edu
## 1 Algeria Arabs 2015 3.195989
## 2 Algeria Berbers 2015 2.436170
## 3 Benin South/Central (Fon) 2015 2.376828
## 4 Benin Southeastern (Yoruba/Nagot and Goun) 2015 2.083770
## 5 Benin Southwestern (Adja) 2015 1.954301
## 6 Botswana Birwa 2015 3.659574
EPR <- read.csv("EPR-2014.csv")
EPR$year <- mapply(seq, EPR$from, EPR$to, SIMPLIFY=FALSE)
EPR <- EPR %>%
unnest(year) %>%
select(-from,-to)
EPR$ccode <- countrycode(EPR$statename, "country.name", "cown")
## Subset just countries in aball
#table(aball$ccode)
aballccode <- as.data.frame(table(aball$ccode))
aballccode <- as.vector(aballccode$Var1)
#head(aballccode) #list of country codes (COW) in aball
EPR <- subset(EPR, EPR$ccode %in% aballccode)
#table(EPR$ccode)
myvars <- c("statename", "group", "year", "status")
EPR <- EPR[myvars]
egeMelt <- join(egeMelt, EPR, by = c("statename", "group", "year"))
egeMelt$included <- ifelse(egeMelt$status == "MONOPOLY" | egeMelt$status == "DOMINANT" | egeMelt$status == "SENIOR PARTNER" | egeMelt$status == "JUNIOR PARTNER", 1, 0)
countryList <- egeMelt$statename
countryList <- unique(countryList)
plots <- list()
for (i in 1:35) {
#Can do any country below.
figure<-ggplot(na.omit(egeMelt[ which(egeMelt$statename==countryList[i]),]), aes(x = factor(year, levels = rev(levels(factor(year)))), y = Edu, group = as.factor(group), colour = as.factor(included))) +
geom_point() +
geom_line() +
theme(legend.position = "bottom") +
xlab("Year") +
ggtitle(countryList[i]) +
scale_x_discrete(breaks=seq(1970,2015, 5)) + labs(color='Included in Power')
plots[[i]] <- figure
#ggsave(figure, file=paste0("plot_", countryList[i],".png"))
#print(figure)
}As of now, I now have a preliminary dataset that contains ethnic group educational attainment rates from 1969-2015 in the following countries:
## [1] "Algeria" "Benin" "Botswana" "Burkina Faso"
## [5] "Burundi" "Cameroon" "Cote d’Ivoire" "Egypt"
## [9] "Ghana" "Guinea" "Kenya" "Lesotho"
## [13] "Liberia" "Madagascar" "Malawi" "Mali"
## [17] "Mauritius" "Morocco" "Mozambique" "Namibia"
## [21] "Niger" "Nigeria" "Senegal" "Sierra Leone"
## [25] "South Africa" "Swaziland" "Tanzania" "Tunisia"
## [29] "Uganda" "Zambia" "Zimbabwe"
For each year, it also includes the EPR ethinc group status information (whether or not the group was a monopoly, dominant, senior partner, junior partner, powerless, discriminated, or irrelevant). I’ve coded that to be “in power” or “not in power”, as defined by EPR. Groups that are “monopoly, dominant, senior partner, or junior partner” are considered to be in power. With this information, I can make figures for each country such as these (where each line is an ethnic group over time):
The creation of this dataset simultaneously supports two of my papers for my dissertation.
The original purpose of this dataset was to be able to ultimately do a cross-country analysis of ethnic group educational attainment on likelihood of democratization. Whereas past analyses primarily focus on:
I will instead be able to:
Where my argument is that as the educational attainment of marginalized groups increases, so does the likelihood of democratization. EPR also includes extensive information on group size and other factors - so I could do a more extensive analysis.
A core assumption of my theory is that education uniquely impacts marginalized communities as opposed to advantaged communities. I argue that education will foster comparatively stronger pro-democratic attitudes among marginalized groups as democracy is a means to inclusion. Alternatively, education will foster state-support (pro-authoritarian) views among advantaged groups to protect the status-quo.
I have another working paper that looks the individual level survey data from Afrobarometer to predict when individuals are more likely to identify with the state ethnic group. I argue that education will lead to marginalized individuals to be more likely to identify with their ethnic group, and that education will lead to advantaged individuals to be more likely to identify with their state.
Now that I have the merged Afrobarometer data that also includes EPR information, i can provide a hierarchical analysis of political attitudes dependent upon individuals’ membership in excluded/included ethnic groups.
## COUNTRY Statename ccode RESPNO age edu primary secondary tertiary
## 1 1 Benin 434 BEN0001-2008 38 4 1 0 0
## 2 1 Benin 434 BEN0002-2008 46 2 0 0 0
## 3 1 Benin 434 BEN0003-2008 28 4 1 0 0
## 4 1 Benin 434 BEN0004-2008 30 3 1 0 0
## 5 1 Benin 434 BEN0005-2008 23 4 1 0 0
## 6 1 Benin 434 BEN0006-2008 24 4 1 0 0
## language year identity rural female employment employed democracy
## 1 100 2008 2 0 1 0 0 3
## 2 104 2008 4 0 0 1 0 3
## 3 101 2008 NA 0 1 2 1 3
## 4 100 2008 5 0 0 1 0 2
## 5 100 2008 5 0 1 1 0 3
## 6 100 2008 5 0 0 1 0 2
## democracyInCountry satisfiedDemInCountry trustPresident trustParliament
## 1 4 4 3 1
## 2 4 4 1 1
## 3 4 2 3 2
## 4 3 2 1 1
## 5 2 2 3 3
## 6 3 3 1 2
## trustRP treatedUnfairly languageName group
## 1 1 0 Fon South/Central (Fon)
## 2 0 0 Yoruba Southeastern (Yoruba/Nagot and Goun)
## 3 2 NA Adja Southwestern (Adja)
## 4 0 0 Fon South/Central (Fon)
## 5 1 0 Fon South/Central (Fon)
## 6 2 1 Fon South/Central (Fon)
## size status timeDiff ageUpdate
## 1 0.330 JUNIOR PARTNER 7 45
## 2 0.185 JUNIOR PARTNER 7 53
## 3 0.150 JUNIOR PARTNER 7 35
## 4 0.330 JUNIOR PARTNER 7 37
## 5 0.330 JUNIOR PARTNER 7 30
## 6 0.330 JUNIOR PARTNER 7 31
There are a handful of obstacles yet to overcome: